免疫组织化学染色图像的可靠定量分析需要准确稳健的细胞检测和分类。最近的弱监督方法通常估计细胞识别的概率密度图。但是,在密集的细胞场景中,由于无法找到通用参数设置,因此可以通过预处理和后处理受到限制。在本文中,我们引入了一个端到端框架,该框架应用了预设锚点的直接回归和分类。具体而言,我们提出了一种锥体特征聚合策略,可以同时组合低级特征和高级语义,该策略为我们的纯粹基于点的模型提供了准确的细胞识别。此外,优化的成本功能旨在通过匹配地面真相和预测点来调整我们的多任务学习框架。实验结果证明了所提出的方法的卓越准确性和效率,这揭示了辅助病理学家评估的很大潜力。
translated by 谷歌翻译
核分型是评估染色体异常可能存在的重要程序。但是,由于非刚性性质,染色体通常在微观图像中弯曲,这种变形形状阻碍了细胞遗传学家的染色体分析。在本文中,我们提出了一个自我发项的指导框架,以消除染色体的曲率。提出的框架提取空间信息和本地纹理,以在回归模块中保留带模式。借助弯曲染色体的互补信息,改进模块旨在进一步改善细节。此外,我们提出了两个专用的几何约束,以维持长度并恢复染色体的变形。为了训练我们的框架,我们创建一个合成数据集,其中通过网格变形从现实世界的直染色体生成弯曲的染色体。定量和定性实验是对合成和现实世界数据进行的。实验结果表明,我们所提出的方法可以有效拉直弯曲的染色体,同时保持带的细节和长度。
translated by 谷歌翻译
在为临床应用设计诊断模型时,至关重要的是要确保模型在各种图像损坏方面的稳健性。在此,建立了易于使用的基准,以评估神经网络在损坏的病理图像上的性能。具体而言,通过将九种类型的常见损坏注入验证图像来生成损坏的图像。此外,两个分类和一个排名指标旨在评估腐败下的预测和信心表现。在两个结果的基准数据集上进行了评估,我们发现(1)各种深神经网络模型的准确性降低(两倍是清洁图像上的误差的两倍)和对损坏图像的不可靠置信度估计; (2)验证和测试错误之间的相关性较低,同时用我们的基准替换验证集可以增加相关性。我们的代码可在https://github.com/superjamessyx/robustness_benchmark上找到。
translated by 谷歌翻译
3D object detection received increasing attention in autonomous driving recently. Objects in 3D scenes are distributed with diverse orientations. Ordinary detectors do not explicitly model the variations of rotation and reflection transformations. Consequently, large networks and extensive data augmentation are required for robust detection. Recent equivariant networks explicitly model the transformation variations by applying shared networks on multiple transformed point clouds, showing great potential in object geometry modeling. However, it is difficult to apply such networks to 3D object detection in autonomous driving due to its large computation cost and slow reasoning speed. In this work, we present TED, an efficient Transformation-Equivariant 3D Detector to overcome the computation cost and speed issues. TED first applies a sparse convolution backbone to extract multi-channel transformation-equivariant voxel features; and then aligns and aggregates these equivariant features into lightweight and compact representations for high-performance 3D object detection. On the highly competitive KITTI 3D car detection leaderboard, TED ranked 1st among all submissions with competitive efficiency.
translated by 谷歌翻译
我们建议以人为本的4D场景捕获(HSC4D)准确有效地创建一个动态的数字世界,其中包含大规模的室内场景,各种各样的人类动作以及人类与环境之间的丰富互动。 HSC4D仅使用车身安装的IMU和LIDAR,没有任何外部设备的限制和无图形地图,没有预构建的地图。考虑到IMU可以捕获人的姿势,但始终为长期使用而漂移,而LiDar对于全球本地化却是稳定的,但对于本地位置和方向而言,HSC4D使两个传感器通过联合优化和实现长期的有希望的结果相互补充。捕获。还探索了人与环境之间的关系,以使其相互作用更加现实。为了促进许多下游任务,例如AR,VR,机器人,自动驾驶等,我们提出了一个数据集,其中包含三个大型场景(1k-5k $ m^2 $),并具有准确的动态人类动作和位置。各种场景(攀岩馆,多层建筑,坡度等)以及挑战人类活动(锻炼,上下楼梯,攀岩等)展示了HSC4D的有效性和概括能力。数据集和代码可在http://www.lidarhumanmotion.net/hsc4d/上获得。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Representing and synthesizing novel views in real-world dynamic scenes from casual monocular videos is a long-standing problem. Existing solutions typically approach dynamic scenes by applying geometry techniques or utilizing temporal information between several adjacent frames without considering the underlying background distribution in the entire scene or the transmittance over the ray dimension, limiting their performance on static and occlusion areas. Our approach $\textbf{D}$istribution-$\textbf{D}$riven neural radiance fields offers high-quality view synthesis and a 3D solution to $\textbf{D}$etach the background from the entire $\textbf{D}$ynamic scene, which is called $\text{D}^4$NeRF. Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively. Each ray sample is given an additional occlusion weight to indicate the transmittance lying in the static and dynamic components. We evaluate $\text{D}^4$NeRF on public dynamic scenes and our urban driving scenes acquired from an autonomous-driving dataset. Extensive experiments demonstrate that our approach outperforms previous methods in rendering texture details and motion areas while also producing a clean static background. Our code will be released at https://github.com/Luciferbobo/D4NeRF.
translated by 谷歌翻译
Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and ({even more importantly}) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies.
translated by 谷歌翻译